zFLoRA: Zero-Latency Fused Low-Rank Adapters
arxiv.org·1d
⚡ONNX Runtime
Flag this post
A hitchhiker's guide to CUDA programming
🎯GPU Kernels
Flag this post
TinyML is the most impressive piece of software you can run on any ESP32
xda-developers.com·1d
⚡ONNX Runtime
Flag this post
An intro to the Tensor Economics blog
lesswrong.com·3d
🏎️TensorRT
Flag this post
The next RISC-V processor frontier: AI
edn.com·1d
🧠CPU Architecture
Flag this post
Inference Acceleration from the Ground Up
semiwiki.com·3d
🧠CPU Architecture
Flag this post
Review of Intel-based UP AI development kits – Part 1: Unboxing and first boot to Ubuntu Pro 24.04
cnx-software.com·10h
🔍Nsight
Flag this post
Duality-Based Fixed Point Iteration Algorithm for Beamforming Design in ISAC Systems
arxiv.org·1d
🔗Kernel Fusion
Flag this post
Sparse Adaptive Attention “MoE”: How I Solved OpenAI’s $650B Problem With a £700 GPU
⚡Flash Attention
Flag this post
Opportunistically Parallel Lambda Calculus
💡LSP
Flag this post
AI efficiency advances with spintronic memory chip that combines storage and processing
techxplore.com·3d
⚡Flash Attention
Flag this post
Accelerating AI inferencing with external KV Cache on Managed Lustre
cloud.google.com·1d
⚡ONNX Runtime
Flag this post
VerfCNN, Optimal Complexity zkSNARK for Convolutional Neural Networks
eprint.iacr.org·2d
🧮cuDNN
Flag this post
How fast can an LLM go?
🏎️TensorRT
Flag this post
Custom Intelligence: Building AI that matches your business DNA
aws.amazon.com·1d
🤖AI Coding Tools
Flag this post
Loading...Loading more...